Identifying Real or Fake Articles: Towards better Language Modeling
نویسندگان
چکیده
The problem of identifying good features for improving conventional language models like trigrams is presented as a classification task in this paper. The idea is to use various syntactic and semantic features extracted from a language for classifying between real-world articles and articles generated by sampling a trigram language model. In doing so, a good accuracy obtained on the classification task implies that the extracted features capture those aspects of the language that a trigram model may not. Such features can be used to improve the existing trigram language models. We describe the results of our experiments on the classification task performed on a Broadcast News Corpus and discuss their effects on language modeling in general.
منابع مشابه
Classification of Fake and Real Articles Based on Support Vector Machines
Fake or real? That is the question, even in the context of languages. In this course project, we are given the task of distinguishing real Broadcast News articles from fake “articles” generated by a trigram model trained from the 100 million word corpus of Broadcast News articles from 1992–1996. This task is clearly not difficult for humans, while machines are not as smart as us to tell whether...
متن کاملFake Variables in research
In each journal, the editorial board receives many articles but more than 70% of them are rejected. This happens because there is no real correlation among the variables in these articles or the variables and perceived relations are fake, which means playing with the variables nonexistent in reality. This rejection occurs mainly as a result of the researchers' misinterpretation of the interdisc...
متن کاملClassifying Articles as Fake or Real Language and Statistics Spring 2007
Is it real or fake? That is the question. A discrimination task that may seem trivial to humans can be extremely complicated for a machine. Humans make use of a “makes sense” feature, which relies on world knowledge including Linguistics, to distinguish between real and fake articles. Unfortunately such a feature does not exist for machines yet. As such, to solve a relatively mundane problem fo...
متن کاملUnsupervised Content-Based Identification of Fake News Articles with Tensor Decomposition Ensembles
Social media provide a platform for quick and seamless access to information. However, the propagation of false information, especially during the last year, raises major concerns, especially given the fact that social media are the primary source of information for a large percentage of the population. False information may manipulate people’s beliefs and have real-life consequences. erefore,...
متن کاملThis Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News
The problem of fake news has gained a lot of attention as it is claimed to have had a significant impact on 2016 US Presidential Elections. Fake news is not a new problem and its spread in social networks is well-studied. Often an underlying assumption in fake news discussion is that it is written to look like real news, fooling the reader who does not check for reliability of the sources or th...
متن کامل